feat(ingestion): cross-connector filter visibility report by harshach · Pull Request #28355 · open-metadata/OpenMetadata

harshach · 2026-05-21T20:40:14Z

Describe your changes:

Generalizes #28336's per-database discovery + filter-reason logging into a reusable helper that every connector family — Database, Dashboard, Pipeline, Messaging, Storage, MLModel — now uses. At end of each source step it emits a single grep-friendly FILTER VISIBILITY REPORT block listing the discovered count, every filtered name + reason, and the kept count per entity type, so users can immediately tell whether a missing entity was removed by includes/excludes or never visible to the ingestion role at all. Storage profile is the diff only — discovered counts (int per type) plus the existing Status.filtered list now carrying richer reasons; no discovered-name or kept-name lists, both are derivable.

Type of change:

Improvement

High-level design:

New ingestion/src/metadata/utils/filter_visibility.py — three helpers (log_discovered, log_filtered, log_step_summary) own the message format centrally so every connector family produces identical output.
Status gained discovered_counts: Dict[str, int] (in-memory only, no spec/JSON schema change) so the report can compute kept = discovered − filtered.
Base-class chokepoints instrumented in database_service.py (_get_filtered_database_names, _get_filtered_schema_names), common_db_source.py (get_tables_name_and_type for tables + views), dashboard_service.py, pipeline_service.py, messaging_service.py, storage_service.py, mlmodel_service.py — each base class's close() emits the report.
Per-connector get_database_names discovery lifted into Snowflake (replacing fix(snowflake): log discovered databases and filter-out reasons #28336's inline logging with helper calls — same output, central format), Postgres, BigQuery, Redshift, MSSQL. MySQL inherits the no-op single-DB default and needs no override.
mlmodel concrete connectors sagemaker/metadata.py and mlflow/metadata.py switched their existing filter_by_mlmodel calls to use the helper so they participate in the report.
Subtle correctness note — _get_filtered_database_names is reached from both main ingestion and the mark_databases_as_deleted maintenance pass. Discovery logging is therefore placed in each connector's get_database_names, not in *_raw, to avoid double-counting; a docstring on the base helper flags this for future contributors.

Example output (Snowflake, Cvent-like config):

======================================================================
 FILTER VISIBILITY REPORT: snowflake
======================================================================

Database (databaseFilterPattern):
  Visible to ingestion user: 10
  Filtered out (4):
    BACKUP_DB    → did not pass databaseFilterPattern, matched against 'svc.BACKUP_DB', useFqnForFiltering=True
    BACKUP_DB_2  → did not pass databaseFilterPattern, matched against 'svc.BACKUP_DB_2', useFqnForFiltering=True
    OLD_TEMP     → did not pass databaseFilterPattern, matched against 'svc.OLD_TEMP', useFqnForFiltering=True
    TEMP_DB      → did not pass databaseFilterPattern, matched against 'svc.TEMP_DB', useFqnForFiltering=True
  Will be published to OpenMetadata: 6

Schema (schemaFilterPattern):
  Visible to ingestion user: 4
  Filtered out (1):
    tmp  → did not pass schemaFilterPattern
  Will be published to OpenMetadata: 3
======================================================================

Tests:

Use cases covered

Customer hits a Snowflake/Postgres/BigQuery/Redshift/MSSQL account, configures databaseFilterPattern.excludes, and can immediately grep FILTER VISIBILITY REPORT to confirm which databases got removed and why
Same flow for dashboard (dashboardFilterPattern, projectFilterPattern), pipeline (pipelineFilterPattern), messaging (topicFilterPattern), storage (containerFilterPattern), mlmodel (mlModelFilterPattern)
Legacy status.filter(name, "Database Filtered Out") callers predating this helper still get bucketed into the right entity-type section of the report (back-compat)

Unit tests

Added ingestion/tests/unit/utils/test_filter_visibility.py — 12 tests covering count accumulation, generator inputs, optional kwargs, rich reason storage, consolidated report format, empty-state no-op, unrelated-reason skip, and legacy-reason-string back-compat
Verified: existing connector unit tests still pass (189/190 across test_snowflake, test_postgres, test_bigquery, test_redshift, test_mssql, test_mysql, test_common_db_source, test_tableau, test_looker, test_airflow, test_s3_storage, test_sagemaker — 1 skipped, 0 failed)

Backend integration tests

Not applicable (no backend API changes)

Ingestion integration tests

Not applicable (instrumentation is purely additive — same Status.filter is still called with the same "<EntityType> Filtered Out" prefix; persisted StepSummary.filtered count is unchanged)

Playwright (UI) tests

Not applicable (no UI changes)

Manual testing performed

source env/bin/activate && cd ingestion && pip install -e . --no-deps
python -m pytest ingestion/tests/unit/utils/test_filter_visibility.py -v → 12/12 pass
python -m pytest <12 connector test files> → 189/190 pass (1 skipped, 0 failed)
cd ingestion && make py_format_check → clean
End-to-end demo script — verified the consolidated report block renders correctly with discovered counts, filtered names + reasons inline, and kept counts per entity type

UI screen recording / screenshots:

Not applicable.

Checklist:

I have read the CONTRIBUTING document.
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: N/A — no spec changes.
For UI changes: N/A — no UI changes.
I have added tests (unit) and listed them above.

🤖 Generated with Claude Code

Summary by Gitar

Refined filter visibility logic:
- Improved get_filtered_count in status.py to correctly aggregate helper-driven and legacy filter() counts without double-counting.
Improved dashboard reporting:
- Coerced project_name to string in dashboard_service.py for consistent formatting in visibility reports.
Enhanced test suite:
- Added unit tests in test_filter_visibility.py to verify accuracy under mixed helper and legacy caller scenarios.

_{This will update automatically on new commits.}

Generalize Snowflake's per-database discovery + filter-reason logging (#28336) to a reusable helper used by every connector family — Database, Dashboard, Pipeline, Messaging, Storage, MLModel. At end of each source step, emit a single grep-friendly "FILTER VISIBILITY REPORT" block listing the discovered count, every filtered name + reason, and the kept count per entity type, so users can tell whether a missing entity was removed by includes/excludes or never visible to the ingestion role. Storage profile is the diff only: discovered counts (int per type) + the existing Status.filtered list (now carrying richer reasons). No discovered-name or kept-name lists; both are derivable. No spec/JSON schema change, no UI work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds a reusable ingestion-side logging utility to standardize “discovered → filtered → kept” visibility across connector families, culminating in a single grep-friendly FILTER VISIBILITY REPORT emitted at step close to help users distinguish source-permission invisibility from filter-pattern exclusion.

Changes:

Introduces metadata.utils.filter_visibility helpers (log_discovered, log_filtered, log_step_summary) and wires them into multiple connector base classes and concrete sources.
Extends Status with an in-memory discovered_counts accumulator to compute “kept” counts without storing full discovered/kept name lists.
Adds unit tests for helper behavior and report formatting.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
ingestion/tests/unit/utils/test_filter_visibility.py	New unit tests validating helper accumulation and report formatting.
ingestion/src/metadata/utils/filter_visibility.py	New centralized helper module for discovery/filter logging and consolidated report emission.
ingestion/src/metadata/ingestion/source/storage/storage_service.py	Uses helper for filtered container entries and emits step summary on close.
ingestion/src/metadata/ingestion/source/pipeline/pipeline_service.py	Logs discovered pipelines, logs filtered pipelines with rich reasons, emits step summary on close.
ingestion/src/metadata/ingestion/source/mlmodel/sagemaker/metadata.py	Logs discovered/filtered ML models using helper.
ingestion/src/metadata/ingestion/source/mlmodel/mlmodel_service.py	Emits step summary on close for MLModel sources.
ingestion/src/metadata/ingestion/source/mlmodel/mlflow/metadata.py	Logs discovered/filtered ML models using helper.
ingestion/src/metadata/ingestion/source/messaging/messaging_service.py	Logs discovered/filtered topics and emits step summary on close.
ingestion/src/metadata/ingestion/source/database/snowflake/metadata.py	Moves Snowflake database discovery/filter logging to shared helper.
ingestion/src/metadata/ingestion/source/database/redshift/metadata.py	Adds discovery + rich filter logging for databases via helper.
ingestion/src/metadata/ingestion/source/database/postgres/metadata.py	Adds discovery + rich filter logging for databases via helper.
ingestion/src/metadata/ingestion/source/database/mssql/metadata.py	Adds discovery + rich filter logging for databases via helper.
ingestion/src/metadata/ingestion/source/database/database_service.py	Instruments DB base-class filtering paths with helper-driven rich filter logging (and schema discovery logging).
ingestion/src/metadata/ingestion/source/database/common_db_source.py	Logs discovered tables/views, logs filtered tables/views with rich reasons, emits step summary on close.
ingestion/src/metadata/ingestion/source/database/bigquery/metadata.py	Adds discovery + rich filter logging for databases/schemas via helper.
ingestion/src/metadata/ingestion/source/dashboard/dashboard_service.py	Logs discovered dashboards, logs filtered dashboards/projects, emits step summary on close.
ingestion/src/metadata/ingestion/api/status.py	Adds `discovered_counts` + `record_discovered()` to support report computations.

…efenses Observability code in the ingestion hot path must not fail connectors. Adds three layers of defense: 1. Helper-internal try/except in log_discovered, log_filtered, and log_step_summary. Any failure is logged once at WARN and swallowed. 2. Call-site try/except around log_step_summary in every base class close() (database, dashboard, pipeline, messaging, storage, mlmodel) so a summary failure can't mask the real cleanup work. 3. Per-entity-type cap on Status.filtered (MAX_FILTERED_ENTRIES_PER_TYPE = 50_000). Past the cap, log_filtered only bumps a new Status.filtered_counts dict so the true count stays accurate without unbounded memory growth. The report annotates the truncation. Test coverage doubled (12 -> 23): adds bounded-growth tests for the cap, resilience tests for broken loggers / bad inputs / corrupted Status, and an integration-style multi-database lifecycle test that verifies counts are correct after the full streaming-log + summary cycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- log_discovered: skip list() materialization when DEBUG logging is off. Use len() directly for Sized inputs (zero alloc) and a streaming sum() for generators. Cuts the per-discovery cost from O(n) memory to O(1) on large catalogs when --debug isn't active. - _entity_type_from_reason: case-insensitive marker match so legacy reasons like "Database Filtered out" (lowercase 'out' from older BigQuery code paths) get bucketed into the right report section instead of being silently undercounted. - dashboard_service.get_dashboard: null-safe via `or []` (matches the Optional[List[Any]] contract Pipeline / Messaging sources already use); materialize names once into a list so log_discovered can take the zero-allocation Sized path instead of re-listing a generator over the heavy dashboard objects. - common_db_source.get_tables_name_and_type / dashboard_service: skip the O(n) shallow copy when query_*_names_and_types() / get_dashboards_list() already returned a list. Only materialize when a subclass returned a generator. - Report: rename "Will be published to OpenMetadata" → "Passed filter patterns" and prepend a note explaining the count is filter-decision only (does not subtract source-side extraction errors or secondary filters like projectFilterPattern). Avoids confusion when reconciling report counts with the final ingested set. Tests: 23 → 28 (added DEBUG-gated materialization, Sized len() path, case-insensitive marker, and clarifying-note assertions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 7 comments.

github-actions · 2026-05-22T02:17:23Z

🔴 Playwright Results — 1 failure(s), 14 flaky

✅ 4144 passed · ❌ 1 failed · 🟡 14 flaky · ⏭️ 91 skipped

Shard	Passed	Failed	Flaky	Skipped
🟡 Shard 1	298	0	1	4
🔴 Shard 2	777	1	3	12
🟡 Shard 3	776	0	3	8
🟡 Shard 4	787	0	1	12
🟡 Shard 5	772	0	1	47
🟡 Shard 6	734	0	5	8

Genuine Failures (failed on all attempts)

❌ Features/IncidentManager.spec.ts › Next, Previous and page indicator (shard 2)

Error: Wait for the pipeline to be successful

�[2mexpect(�[22m�[31mreceived�[39m�[2m).�[22mtoBe�[2m(�[22m�[32mexpected�[39m�[2m) // Object.is equality�[22m

Expected: �[32m"success"�[39m
Received: �[31m"running"�[39m

Call Log:
- Test timeout of 180000ms exceeded

🟡 14 flaky test(s) (passed on retry)

Features/CustomizeDetailPage.spec.ts › Glossary Term - customization should work (shard 1, 1 retry)
Features/Glossary/GlossaryWorkflow.spec.ts › should start term as Draft when glossary has reviewers (shard 2, 1 retry)
Features/IncidentManager.spec.ts › Complete Incident lifecycle with table owner (shard 2, 2 retries)
Features/KnowledgeCenter.spec.ts › Article mentions in description should working for Knowledge Center (shard 2, 1 retry)
Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
Features/Tasks/ActivityFeed.spec.ts › creating task should immediately appear in entity feed (shard 3, 1 retry)
Flow/AddRoleAndAssignToUser.spec.ts › Verify assigned role to new user (shard 3, 1 retry)
Pages/Entity.spec.ts › Announcement create, edit & delete (shard 4, 1 retry)
Pages/ExplorePageRightPanel_KnowledgeCenter.spec.ts › Should remove user owner for knowledgeCenter (shard 5, 1 retry)
Features/AutoPilot.spec.ts › Agents created by AutoPilot should be deleted (shard 6, 1 retry)
Pages/ExplorePageRightPanel.spec.ts › Should clear description for table (shard 6, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
Pages/TasksUIFlow.spec.ts › Create and reject tag task for Topic via UI (shard 6, 1 retry)
Pages/UserDetails.spec.ts › Create team with domain and verify visibility of inherited domain in user profile after team removal (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

- mark_databases_as_deleted: pass add_to_status=False to _get_filtered_database_names. Without this, every filtered database was being recorded twice (once during main ingestion via the per-connector get_database_names, once during the maintenance pass), inflating Status.filtered + filtered_counts and skewing the report. - _get_filtered_schema_names: only materialize the raw schema names when add_to_status=True (where log_discovered needs the count up front). The mark-deleted maintenance paths iterate once with add_to_status=False — streaming saves O(n) memory on large schemas. Same fix applied to BigQuery's override. - dashboard/pipeline/messaging get_X(): compute names once into (entity, name) tuples instead of calling the connector-specific get_*_name() twice per entity (once to build the discovery list, once inside the filter loop). Real savings when name extraction is non-trivial (e.g., Tableau workbook lookups). - DB connector log_filtered calls: use the raw entity name (database_name, schema_name, table_name, view_name, new_database, project_id) as `name` and the FQN as `matched_against`. The previous convention of `name=<fqn>` produced confusing report output when useFqnForFiltering was False ("filtered name = FQN, matched against = raw name") and was inconsistent with non-SQL connectors (Dashboard/Pipeline/Topic/ MLModel) which already stored raw names. Report column is now readable across all connector families. Tests: existing 33 still pass (no test changes needed — the storage-key change to raw names is verified by the existing dashboard/pipeline/topic assertions which always asserted raw names). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three categories of new basedpyright errors introduced by the filter visibility refactor — all surfaced by the `--baselinemode=discard` CI gate: - log_step_summary: source_name was typed `str` but every call site passes `self.config.serviceName` which pyright infers as `str | None`. Widened to `str | None` and gracefully fall back to '<unknown source>' in the report header. - filter_by_schema / filter_by_table / filter_by_database: I extracted the `filter_name = X if useFqn else Y` ternary into a variable across several connectors. The inline ternary was fine but the extracted variable became `str | None` to pyright (because one branch is an FQN typed as Optional). Widened these three helpers in filters.py to accept `Optional[str]` — they already delegate to `_filter()` which handles None per its own contract. Aligns the wrapper signatures with the actual delegate behavior; one-file fix instead of casting at 9 call sites. - mlflow.get_mlmodels: the existing `cast(RegisteredModel, ...)` told pyright the search call returned a single model, so my `list(...)` wrap produced `list[tuple[str, Any]]` and broke `m.name` access. Fixed the cast to `cast(List[RegisteredModel], list(...))`. That surfaced a pre-existing latent bug: `model.latest_versions` is Optional but was iterated directly — guarded with `or []`. Verification: - `basedpyright -p pyproject.toml --baselinefile .basedpyright/baseline.json --baselinemode=discard` now reports zero new errors (the remaining errors are env-specific airflow/locust/pydoris import warnings already in the baseline; CI has those packages installed). - `make py_format_check` clean. - 79/79 helper + filter_pattern + common_db_source + snowflake tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gitar-bot · 2026-05-24T22:43:17Z

Code Review ✅ Approved 4 resolved / 4 findings

Generalizes filter visibility reporting across all connector families to provide clear, grep-friendly logs for entity filtering. All prior review findings regarding count aggregation, dashboard project naming, and memory management have been resolved.

✅ 4 resolved

✅ Performance: Materializing full dashboard list may cause memory pressure

📄 ingestion/src/metadata/ingestion/source/dashboard/dashboard_service.py:575-581
The change at line 575 converts self.get_dashboards_list() from a lazy iterator/generator into a fully materialized list(...). Many concrete dashboard connectors (e.g., Tableau) implement this as a generator with pagination (yield from). For large environments with tens of thousands of dashboards, this unnecessarily holds the entire set of dashboard objects in memory simultaneously — both for the list itself and for the generator expression passed to log_discovered on line 580 which iterates names.

Additionally, the generator expression (self.get_dashboard_name(d) for d in dashboards) on line 580 is consumed by log_discovered which calls list(names) internally, creating a second full list of all dashboard names.

Suggested approach: compute only the count without storing names, or iterate once and collect just the names (strings are much cheaper than full dashboard objects).

✅ Edge Case: 'kept' count may be misleading when dashboards fail detail extraction

📄 ingestion/src/metadata/utils/filter_visibility.py:147-149 📄 ingestion/src/metadata/ingestion/source/dashboard/dashboard_service.py:607-621
In _format_entity_section (filter_visibility.py:148), the 'kept' count is discovered - filtered_count. However in dashboard_service.py, dashboards that pass the filter pattern but then fail in get_dashboard_details() (line 620-623) or get rejected by projectFilterPattern (logged as entity_type 'Project', not 'Dashboard') are not counted as 'Dashboard Filtered Out'. This means the report may show 'Will be published to OpenMetadata: N' where N is higher than the actual number ingested.

This is inherent to the design (the report shows filter-pattern rejections specifically, not all skip reasons), but it could confuse users trying to reconcile counts. Consider adding a brief note in the report header or documentation clarifying that 'kept' means 'passed filter patterns' not 'successfully ingested'.

✅ Edge Case: log_filtered called with list instead of string for project_name

📄 ingestion/src/metadata/ingestion/source/dashboard/dashboard_service.py:604-617
At dashboard_service.py line 612-617, log_filtered is called with project_name as the name argument. However, on line 604-606, project_name is reassigned to project_names (a list) when the list is non-empty. Passing a list to log_filtered where a str is expected will result in the filter reason storing a stringified list (e.g., "['proj1', 'proj2']") in status.filtered, which produces ugly output in the report and may not match expected parsing in _entity_type_from_reason.

✅ Edge Case: get_filtered_count undercounts in mixed helper+legacy usage

📄 ingestion/src/metadata/ingestion/api/status.py:109-116
When both the log_filtered helper (which increments filtered_counts) and legacy direct status.filter() calls (which only appends to filtered) are used on the same Status instance AND the helper path hits the per-type cap, max(len(self.filtered), sum(self.filtered_counts.values())) will undercount.

Example: helper filters 60,000 tables (cap stores 50k names), legacy filters 5 schemas directly → len(filtered)=50005, sum(filtered_counts)=60000, max()=60000 but true total is 60,005.

In practice this requires >50k filtered items of a single type plus legacy callers on the same status, so impact is minimal. A precise formula would count legacy entries (those without a matching filtered_counts key) separately.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

sonarqubecloud · 2026-05-24T23:55:21Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
12 New issues
0 Accepted issues

Measures
0 Security Hotspots
73.2% Coverage on New Code
6.8% Duplication on New Code

See analysis details on SonarQube Cloud

gitar-bot · 2026-06-01T23:57:22Z

Code Review ✅ Approved 4 resolved / 4 findings

Generalizes filter visibility reporting across all connector families to provide clear, grep-friendly logs for entity filtering. All prior review findings regarding count aggregation, dashboard project naming, and memory management have been resolved.

✅ 4 resolved

✅ Performance: Materializing full dashboard list may cause memory pressure

📄 ingestion/src/metadata/ingestion/source/dashboard/dashboard_service.py:575-581
The change at line 575 converts self.get_dashboards_list() from a lazy iterator/generator into a fully materialized list(...). Many concrete dashboard connectors (e.g., Tableau) implement this as a generator with pagination (yield from). For large environments with tens of thousands of dashboards, this unnecessarily holds the entire set of dashboard objects in memory simultaneously — both for the list itself and for the generator expression passed to log_discovered on line 580 which iterates names.

Additionally, the generator expression (self.get_dashboard_name(d) for d in dashboards) on line 580 is consumed by log_discovered which calls list(names) internally, creating a second full list of all dashboard names.

Suggested approach: compute only the count without storing names, or iterate once and collect just the names (strings are much cheaper than full dashboard objects).

✅ Edge Case: 'kept' count may be misleading when dashboards fail detail extraction

📄 ingestion/src/metadata/utils/filter_visibility.py:147-149 📄 ingestion/src/metadata/ingestion/source/dashboard/dashboard_service.py:607-621
In _format_entity_section (filter_visibility.py:148), the 'kept' count is discovered - filtered_count. However in dashboard_service.py, dashboards that pass the filter pattern but then fail in get_dashboard_details() (line 620-623) or get rejected by projectFilterPattern (logged as entity_type 'Project', not 'Dashboard') are not counted as 'Dashboard Filtered Out'. This means the report may show 'Will be published to OpenMetadata: N' where N is higher than the actual number ingested.

This is inherent to the design (the report shows filter-pattern rejections specifically, not all skip reasons), but it could confuse users trying to reconcile counts. Consider adding a brief note in the report header or documentation clarifying that 'kept' means 'passed filter patterns' not 'successfully ingested'.

✅ Edge Case: log_filtered called with list instead of string for project_name

📄 ingestion/src/metadata/ingestion/source/dashboard/dashboard_service.py:604-617
At dashboard_service.py line 612-617, log_filtered is called with project_name as the name argument. However, on line 604-606, project_name is reassigned to project_names (a list) when the list is non-empty. Passing a list to log_filtered where a str is expected will result in the filter reason storing a stringified list (e.g., "['proj1', 'proj2']") in status.filtered, which produces ugly output in the report and may not match expected parsing in _entity_type_from_reason.

✅ Edge Case: get_filtered_count undercounts in mixed helper+legacy usage

📄 ingestion/src/metadata/ingestion/api/status.py:109-116
When both the log_filtered helper (which increments filtered_counts) and legacy direct status.filter() calls (which only appends to filtered) are used on the same Status instance AND the helper path hits the per-type cap, max(len(self.filtered), sum(self.filtered_counts.values())) will undercount.

Example: helper filters 60,000 tables (cap stores 50k names), legacy filters 5 schemas directly → len(filtered)=50005, sum(filtered_counts)=60000, max()=60000 but true total is 60,005.

In practice this requires >50k filtered items of a single type plus legacy callers on the same status, so impact is minimal. A precise formula would count legacy entries (those without a matching filtered_counts key) separately.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

harshach requested a review from a team as a code owner May 21, 2026 20:40

Copilot AI review requested due to automatic review settings May 21, 2026 20:40

Copilot started reviewing on behalf of harshach May 21, 2026 20:40 View session

github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels May 21, 2026

gitar-bot Bot reviewed May 21, 2026

View reviewed changes

Comment thread ingestion/src/metadata/ingestion/source/dashboard/dashboard_service.py Outdated

gitar-bot Bot reviewed May 21, 2026

View reviewed changes

Comment thread ingestion/src/metadata/utils/filter_visibility.py Outdated

Copilot AI reviewed May 21, 2026

View reviewed changes

harshach had a problem deploying to test May 21, 2026 20:50 — with GitHub Actions Error

harshach had a problem deploying to test May 21, 2026 21:05 — with GitHub Actions Error

harshach had a problem deploying to test May 21, 2026 21:05 — with GitHub Actions Failure

harshach had a problem deploying to test May 21, 2026 21:05 — with GitHub Actions Error

Copilot AI review requested due to automatic review settings May 21, 2026 22:21

Copilot started reviewing on behalf of harshach May 21, 2026 22:21 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

harshach had a problem deploying to test May 21, 2026 22:32 — with GitHub Actions Error

Copilot started reviewing on behalf of harshach May 21, 2026 23:58 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

harshach temporarily deployed to test May 22, 2026 00:08 — with GitHub Actions Inactive

harshach temporarily deployed to test May 22, 2026 16:03 — with GitHub Actions Inactive

Copilot AI review requested due to automatic review settings May 24, 2026 22:41

Copilot started reviewing on behalf of harshach May 24, 2026 22:41 View session

Copilot AI reviewed May 24, 2026

harshach temporarily deployed to test May 24, 2026 22:53 — with GitHub Actions Inactive

harshach had a problem deploying to test May 24, 2026 22:53 — with GitHub Actions Failure

harshach closed this Jun 1, 2026

Conversation

harshach commented May 21, 2026 • edited by gitar-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes:

Type of change:

High-level design:

Tests:

Use cases covered

Unit tests

Backend integration tests

Ingestion integration tests

Playwright (UI) tests

Manual testing performed

UI screen recording / screenshots:

Checklist:

Summary by Gitar

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔴 Playwright Results — 1 failure(s), 14 flaky

Genuine Failures (failed on all attempts)

Uh oh!

gitar-bot Bot commented May 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented May 24, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

gitar-bot Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

harshach commented May 21, 2026 •

edited by gitar-bot Bot

Loading

github-actions Bot commented May 22, 2026 •

edited

Loading

gitar-bot Bot commented Jun 1, 2026 •

edited

Loading